6 research outputs found

    A Mocktail of Source Code Representations

    Full text link
    Efficient representation of source code is essential for various software engineering tasks such as code search and code clone detection. One such technique for representing source code involves extracting paths from the AST and using a learning model to capture program properties. Code2vec is a commonly used path-based approach that uses an attention-based neural network to learn code embeddings which can then be used for various software engineering tasks. However, this approach uses only ASTs and does not leverage other graph structures such as Control Flow Graphs (CFG) and Program Dependency Graphs (PDG). Similarly, most recent approaches for representing source code still use AST and do not leverage semantic graph structures. Even though there exists an integrated graph approach (Code Property Graph) for representing source code, it has only been explored in the domain of software security. Moreover, it does not leverage the paths from the individual graphs. In our work, we extend the path-based approach code2vec to include semantic graphs, CFG, and PDG, along with AST, which is still largely unexplored in the domain of software engineering. We evaluate our approach on the task of MethodNaming using a custom C dataset of 730K methods collected from 16 C projects from GitHub. In comparison to code2vec, our approach improves the F1 Score by 11% on the full dataset and up to 100% with individual projects. We show that semantic features from the CFG and PDG paths are indeed helpful. We envision that looking at a mocktail of source code representations for various software engineering tasks can lay the foundation for a new line of research and a re-haul of existing research

    Diversity in Software Engineering Conferences and Journals

    Full text link
    Diversity with respect to ethnicity and gender has been studied in open-source and industrial settings for software development. Publication avenues such as academic conferences and journals contribute to the growing technology industry. However, there have been very few diversity-related studies conducted in the context of academia. In this paper, we study the ethnic, gender, and geographical diversity of the authors published in Software Engineering conferences and journals. We provide a systematic quantitative analysis of the diversity of publications and organizing and program committees of three top conferences and two top journals in Software Engineering, which indicates the existence of bias and entry barriers towards authors and committee members belonging to certain ethnicities, gender, and/or geographical locations in Software Engineering conferences and journal publications. For our study, we analyse publication (accepted authors) and committee data (Program and Organizing committee/ Journal Editorial Board) from the conferences ICSE, FSE, and ASE and the journals IEEE TSE and ACM TOSEM from 2010 to 2022. The analysis of the data shows that across participants and committee members, there are some communities that are consistently significantly lower in representation, for example, publications from countries in Africa, South America, and Oceania. However, a correlation study between the diversity of the committees and the participants did not yield any conclusive evidence. Furthermore, there is no conclusive evidence that papers with White authors or male authors were more likely to be cited. Finally, we see an improvement in the ethnic diversity of the authors over the years 2010-2022 but not in gender or geographical diversity.Comment: 13 pages, 10 figures, 4 table

    Android Access Control Recommendation as a Deep Learning Task

    No full text
    Android enforces access control checks to protect sensitive framework APIs. If not properly protected, framework APIs can open the door for malicious apps to access sensitive resources without having the necessary privileges. Unfortunately, as reported in the existing literature, such access control anomalies are prevalent in Android APIs, notably those introduced by customization parties. Therefore, various solutions have been proposed to detect anomalies, particularly those due to inconsistencies in the enforcement of access checks across the Android framework(s). The solutions can be largely divided into two categories: convergence-based techniques which rely on the convergence of two APIs on similar resources, and probabilistic approaches which incorporate additional hints in the form of manually defined structural and semantic code constructs. In this paper, we are motivated by the promising application of using code constructs, beyond convergence as proposed by the probabilistic approaches, to recommend access control enforcement and detect inconsistencies. Specifically, we propose a deep learning-based approach that aims to automatically learn the correspondence between various code constructs and access control requirements. To this end, we fine-tune CodeBert on statically derived features from the Android Open Source Project (AOSP). Our feature engineering process addresses various peculiarities that characterize Android implementations. The resulting fine-tuned model can be queried to recommend access control for vendor-customized APIs. The fine-tuned model achieves an accuracy of 93%, a precision of 91%, and a recall of 92% in the AOSP data. Additionally, our evaluation of custom ROMs shows that the model is able to not only rediscover previously reported inconsistencies but also discover new ones
    corecore